Identification of nonlinear noisy dynamics of an ecosystem from 



observations of one of its trajectory components 

V.N. Smelyanskiy 1 *, D.G. Luchinsky 2 ' 1 , and M. Millons 2 
1 NASA Ames Research Center, Mail Stop 269-2, Moffett Field, CA 94035, USA and 
Mission Critical Technologies Inc., 204 1 Rosecrans Ave. Suite 225 El Segundo, CA 9024^ 

(Dated: February 2, 2008) 

The problem of determining dynamical models and trajectories that describe ob- 
served time-series data (dynamical inference) allowing for the understanding, predic- 
tion and possibly control of complex systems in nature is one of very great interest 
in a wide variety of fields. Often, however, in multidimensional systems only part of 
the system's dynamical variables can be measured. Furthermore, the measurements 
are usually corrupted by noise and the dynamics is complicated by an interplay of 
nonlinearity and random perturbations. The problem of dynamical inference in these 
general settings is challenging researchers for decades. We solve this problem by ap- 
plying a path-integral approach to fluctuational dynamics 0, 0h an d show that, 
given the measurements, the system trajectory can be obtained from the solution 
of the certain auxiliary Hamiltonian problem in which measured data act effectively 
as a control force driving the estimated trajectory toward the most probable that 
provides a minimum to certain mechanical action. The dependance of the mini- 
mum action on the model parameters determines the statistical distribution in the 
model space consistent with the measurements. We illustrate the efficiency of the 
approach by solving an intensively studied problem from the population dynamics 
of predator-prey system 0] where the prey populations may be observed while the 
predator populations or even their number is difficult or impossible to estimate. We 
emphasize that the predator-prey dynamics is fully nonlinear, perturbed stochas- 
tically by environmental factors and is not known beforehand (see e.g. 0). No 
overall solution was previously available for this problem even in the deterministic 
case 0, Q| • We apply our approach to recover both the unknown dynamics of preda- 
tors and model parameters (including parameters that are traditionally very difficult 
to estimate) directly from measurements of the prey dynamics. We provide a com- 
parison of our method with the Markov Chain Monte Carlo technique. As a further 
test of the method we demonstrate the reconstruction of the dynamics of chaotic 
Lorenz attractor driven by noise from measurements of only one if its trajectory 
component. 
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I. INTRODUCTION 

For quantitative understanding, predicting, and controlling time-varying phenomena 
it is necessary to relate observations to a mathematical model of a system dynamics. 
In a great number of important problems such model is multidimensional, nonlinear, 
stochastic and not known from "first principles". Furthermore, often only part of the 
system's variables can be measured and these measurements are corrupted by noise. The 
rest of the system variables are invisible, or hidden. In these settings, perhaps the most 
fundamentally difficult unsolved problem of dynamical inference is how and to which 
extent one can learn both the model parameters and system trajectory from a given set of 
incomplete trajectory measurements. A solution of this problem is of importance across 
many di scip lines. Examples range from molecular motors j3| to coupled matter-radiation 
Q (see e.g. Hfl 



system 



12, 



13| for further examples). 
Here we present a solution to this problem using a path-integral approach to fluctu- 
ational dynamics fl]]. We show that, given the measurements, the most probable system 
trajectory can be obtained from finding the minimum of the mechanical action of a certain 
auxiliary Hamiltonian system under properly defined boundary conditions. The depen- 
dence of the minimum action on the system model parameters determines the statistical 
likelihood of different parametric models. 

The method is used to solve an intensively studied problem from the population dy- 
namics of the predator-prey system^, ^| where the cyclic dynamics of populations of 



small rodents is observed in Kilpisjarvi, Finnish Lapland, 1952-1992 (see Fig. ^a)) 
while the number of predators is difficult or impossible to estimate. The predator-prey 
dynamics is fully nonlinear subject to seasonal and random perturbations. This is a clas- 
sical longstanding problem in ecology [16| and epidemiology (see e.g. [ill)- m particular, 
the cited database accumulates nearly 5000 individual datasets with similar structure 
collected over more then 150 years of research. It is shown that the proposed approach 
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allows to recover both the unknown dynamics of predators and model parameters directly 
from measurements of the prey dynamics. 



40 
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FIG. 1: (a) Stochastic trajectory of the population dynamics of small rodents observed in 
Kilpisjarvi, Finnish Lapland, 1952-1992 I s shown by yellow dots. Black solid line is shown 
to guide an eye. Dashed lines shows the solution of the optimization problem, (b) Black solid 
line shows recovered hidden dynamics of the population of the specialists predators obtained by 
varying parameters r and rK'/K of the model (0. Parameters used to obtain these results are: 
r = 5.2 ± 2.5, rK'/K = -5.2 ± 2.5; s = 1.2; a = 15; g = 0.1; el = .8; e2 = .5; K = 90; Q = 
30; o n = 0.02; a p = 0.02. The insert shows the cross-section of the weighted distribution of the 
dynamical trajectories for the year 1956 indicated by the arrow in the main figure. 



II. PATH-INTEGRAL APPROACH TO THE PROBLEM OF DYNAMICAL 

INFERENCE. 

To formalize the discussion above note that in a typical experimental situation we 
observe M-dimensional time series of signals y = {y(t n ) = y(^o + hn), n = 0,/C}, with 
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the sampling step h. The unknown is the actual L-dimensional dynamical trajectory of 
the system x(t). We are interested in the case where M < L so some of the trajectory 
components are hidden. Quantitative understanding of the time-varying phenomena un- 
derlying y requires, in general, an expert input into the observed data in the form of 
a mathematical modelling framework for the system dynamics and for the measurement 
scheme. A commonly used dynamical and measurement equations for nonlinear models 
in the presence of random perturbations that apply to but by far not limited by the 
predator-prey ecological system described above are 

±i(t) = ^(x(t), c) + &(t), £(0) = D i5 8{t - 0, (1) 

L 

y h (t) = B ki Xi(t) + Pk(t), (/3 k (t) A(0) = N M 5(t - 0- (2) 
i=i 

Here we introduced continues-time interpolations yk(t) for components of the observed 
time-series y. This approximation can often be justified for a sufficiently small sampling 
interval h and a large number of data points, fC 1, in the time-series y. In (JTJ) Xi(t) (i = 
1 : L) are dynamical variables composing a vector x(t) that describes an instantaneous 
state of the system. The system dynamics in is governed by L-dimensional vector field 
with components depending on the set of parameters {c a } = c and white Gaussian 
process with zero- mean components characterized by a L x L correlation matrix 

D. The deterministic part of the measurement equation in Q is described by M x L 
measurement matrix B and the measurement error is described by the white Gaussian 
process with zero-mean components {(3 m {t)} and M x M measurement noise matrix N nm 
(M < L). Overall, the dynamical and measurement model (0) is characterized by the full 
set of the unknown parameters {M. a } = { c , B, D, N}. 

Due to the presence of dynamical and measurement noise the problem of dynamical in- 
ference must be cast in probabilistic terms. This can be done within a general framework 
of the Baeysian statistical approach Q] . A key statistical quantity is a so-called likeli- 
hood probability density functional (LPDF) Vy[x.(t); Ai]. It represents a joint probability 
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density that the system trajectory is x(t) and the system parameter values are {-M a } 
conditioned on the observed time-series y. We emphasize that in a real physical process 
the system has a distinct trajectory and parameter values. In this regard the LPDF rep- 
resents a degree of uncertainty in our knowledge about these quantities obtained from the 
measurements and assuming some basic properties of the system ffuctuational dynamics 

H- 

The explicit form of the LPDF can be obtained from using the path- 
integral approach to ffuctuational dynamics We write V y [x{t);M] = 
Ay exp (— Sy[x.(t); M.]), where Ay is a normalization constant and a negative log- 
likelihood functional Sy is obtained in the Appendix A 



Sy[x(t);M) = l£ dt (y(*) -Bx^r 1 (y(t) -Bx(t)) + V • K(x(t),c) 
+ (x(t) - K(x(t), c)) T ET 1 ^) - K(x(t), c))] + - In (det(D) det(N)) . (3) 

Here T = Kh is a time length of the data record y. In what following we shall focus 
on the case M < L that implies the existence of hidden variables. We note that despite 
hidden dynamical variables are not measured directly the functional S y fl^J) still depends 
on them explicitly because of the dynamic coupling between the variables imposed by the 
force field K. 

In many practically important cases available recording of a system trajectory, while 
containing only a part of dynamical variables, has sufficiently small time-step, long time 
duration and limited noise characteristics. Such measurements can provide a strong in- 
formation that is sufficient to pin down both key model parameters of the system and 
its trajectory, or at least to extract strong correlations between them. In these cases the 
joint LPDF P y (x(t),J\4) will be well localized in the vicinity of one or more of its maxima 
where 8S/8x.(t) = and {dS/dAi a = 0}. In the case where a single maximum dominates 
LPDF its position corresponds to the trajectory x opt (£) and parameter values {-M° pt } 
that the system most probably has, given the measurements y. 
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We now put forth a new paradigm in which a solution of the dynamical inference 
problem with hidden variables is obtained via the calculus of variations for the functional 
Sy(x.(t),A4). The power of this approach is in its simplicity, efficiency and an insight that 
it provides to the solution of dynamical inference problem by drawing a close connection 
to the methods and concepts of classical mechanics, in particular, a least action principle. 

We search for the minimum of Sy(x.(t),A4) by alternatively computing the expected 
values of x(t) and model parameters in A4 from the solution of the two variational prob- 
lems = and = 0. The first condition corresponds to a solution of the boundary 
value problem for an auxiliary mechanical system with the coordinate x, momentum p 
and a Hamiltonian function H (x, p) 

tf(x,p) = --(y-Bx) N-^y-Bx) - -— + Kp + -p T Dp, (4) 
p = D- 1 (x(t)-K). (5) 



We look for the solution of the Hamiltonian equations 

X = K+Dp ' p =2&r-ar p -( y - Bx ) N B (6) 

that satisfy the boundary conditions p(0) = p(i) = 0. If several solutions exist we choose 
the one providing a minimum of a functional Sy[x.(t); A4] playing a role of a mechanical 
action. We then fix the inferred trajectory x(i) and update the value of the parameters in 
the set A4, using analytical solution of the second variational problem, = 0, developed 
in our earlier research 



I (see Appendix A for details). This procedure is repeated 
iteratively until the desired convergence is achieved. The outcome of this algorithm is 
the most probable system trajectory x opt (t) and model parameters A4 opt . The measure 
of their fitness to the observed data y is tx exp (S opt ) where the globally minimum action 
S opt = Sy^it); M opt }. 

The Bayesian approach for dynamical inference was initially proposed by Meyer and 
Christensenin for the case where all variables were directly observed. The previous 



work on this subject (see e.g. [y, UjJ, |2C| |2l|) was exclusively focusing on brute force 



numerical methods, such as Markov Chain Monte Carlo (MCMC). However our detailed 
study of MCMC approach for the problem of dynamical inference with hidden variables 
has shown that the functional Sy(x(t),Ai) has multiple deep spurious minima in the space 
of piece- wise continuous trajectories {x(t m ); m = 1 : M}. These minima occur due to the 
contributions to the cost functions from the terms of the order of _ jf one 

starts from a poor guess about both the system trajectory and model parameters MCMC 
search stacks in those minima and takes a prohibitively large time to converge (see Sec. 
for details). In contrary, our approach avoids those spurious minima because the solution 
of the Hamiltonian boundary value problem (jUJ) is achieved via large smooth variations in 
the space of continuous trajectories. This key finding reflects a basic property of a hidden- 
variable and parameter inference in a noisy dynamical system with continuous vector field 
K(x, C): expected value of the inferred trajectory x opt (t) is a smoothly varying function 
of time whereas the measured signal y(i) is not. 

We note that the likelihood distribution around the maximum is determined by the 
second variation 5 2 Sy of the action with respect to the both x(£) and M. computed at 
its minimum. In many cases, in particular in the case of a multi-modal LPDF, it is of 
interested to explicitly study the full shape of the LPDF in a reduced subspace of the 
model parameters while marginalizing the LPDF with respect to the other parameters 
and the system trajectory. Within our approach this can be handily done by computing 
the minimum action S opt (ci,C2) using the above algorithm for different values of the 
parameters (ci,Cz). 

However in many complex cases where observational and model errors are significant 
and hidden variables are present LPDF can have a very large number of local minima. In 
this case the more informative quantity is a distribution of local minima. To obtain it we 
pick at random some values of (ci, c 2 ) and converge to the nearest point (c[, c' 2 ) of a local 
minimum of the action S opt (ci, c 2 ) where the conditions -jM* = and = are satisfied. 
We then repeat this procedure many times for different starting values of (01.02). Then 
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the histogram of the local minima (c' 1; c' 2 ) weighed with the factors exp(— S , opt (c / 1 , c' 2 ) and 
appropriately normalized gives the distribution of local minima P = Py(ci, C2). We will 
demonstrate this approach in Sec. IIIII for the inference of the population dynamics. 

Finally, we emphasize that the prerequisite of the approach considered in this section 
is that LPDF computed at any set M. of relevant parameter values has a sharp peak in 
the space of the system paths x(£) at some x opt (t) that depends on M. (cf. Fig. |3J). 



III. INFERENCE OF PREDATOR-PREY MODEL. 



We now apply the method described above to reconstruction of the unknown predator 
dynamics and model parameters from the observed oscillations of small rodents in Finish 
Lapland. The observed time-series data is shown by yellow circles in the Fig. 1(a). To 
formulate the problem we first briefly summarize an expert input into observed data, see 



e.g. 



uate tne propiem we nrst Dneny summarize an cxpcr 



that the most likely predators 



to potentially maintain oscillatory dynamics in rodent populations are small mustelids, 
weasels, and stoats which are notoriously difficult to observe and study in the field. It 
was further argued 0, Q that in addition to the dominating effect of these so-called 
specialist predators the population of rodents is strongly affected by generalists predators 
(such as foxes, owls and skua) and by seasonal (periodic) and stochastic variations of the 
environment. Based on these arguments the following equations were introduced to model 
observed ecological time series 

n m2 CNP 

N — rN (1 — e x sin(27rt) + <7 n £ n (t)) — (r/K)N 2 — 



N 2 + H 2 N + D' 
P 2 

P = sP(l-e 2 sin(27rt) + a^ p (t)) - sQ — . (7) 

Here the state of the system is characterized by the dynamical variables N and P, cor- 
responding to the density of rodents and predators, respectively. Taking into account 
a log-normal distribution of the measurement errors the measured rodent density N' is 
related to the actual (unknown) value iV via N' = N exp(a f, s r](t)) where r)(t) is a white 
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FIG. 2: Result of the direct comparison of the path-integral and MCMC techniques for (a) 
observed variable x\(t) and (b) hidden variable Xzif). The actual dynamical trajectories x\(t) 
and xz(t) are shown by solid lines. The measured trajectory yx(t) is shown by dashed black line 
in figure (a) and is taken as initial guess for the solution x±(t). For an unobservable trajectory 
X2(t) initial guess is taken to be (t) = 0. The solution of the boundary value problem is shown 
by yellow circles. The MCMC solution is shown by red squares. The inset in the figure (a) 
shows the variation of the cost function as a function of time for MCMC algorithm (black dots) 
and for boundary value method (yellow circle). 

Gaussian noise of unit intensity. The predator density is not measured so the variable P 
is hidden. In (J7J) £p(t)} is a zero-mean white Gaussian vector of dynamical noise. 

The precise functional form is known neither for predation nor for numerical response 
of the predators and some modifications of the equations (J2J) where considered in the 
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literature 

The problem of dynamical inference is the following: Use 80 experimental points of 
corrupted by noise measurements to recover both hidden dynamics of predators P = P(t) 
and the model of the nonlinear stochastic dynamics of small rodent in Fennoscandia 
represented by the full set of parameters from Eq. ((Zj) and a os b. Since there were no general 
methods to recover neither hidden dynamics nor nonlinear models of stochastic systems 
it was a,wa yS — (see ,, Q Q) that the g oa, to obtain solution of this probl e m is 
unrealistic and no attempt was made to solve it in the earlier research. Instead a number 
of models were developed 0, |^ from the first ecological principles and from the 

extensive field studies of the small rodents ecology. The outcome of the simulation of these 
models was compared to the experimental points to decide whether or not the model is 
capable of producing reasonable predictions. This approach although very valuable and 
often the only one available in practice has very limited statistical significance and can 
hardly be generalized. 

The method introduced above provides a general and effective alternative approach to 
a solution of this ecological problem. First, we map the predator-pray model (J7J) directly 
onto the dynamical model with additive white noise considered in Eqs. ([!} by making the 
change of variables: xi(t) = log(N(t)/K') and X2(t) = \og(Q'P/K') (here some known 
nominal values are used for the scaling coefficients K' and Q'). Then the full set of 
unknown parameters c = {r, s, ei, e 2 , K, G, C, Q, H, D, a n , a p , <r b s } and the trajectory of 
the predator density P = P(t) is inferred from the observed data using the dynamical 
inference scheme described above (for the sca ling of dynamical equations and precise 
ecological meaning of these parameters see j^, ll<4 l^fil]). To present the solution of this 
inference problem we investigate the marginalized LPDF as a function of key model 
parameters that are notoriously difficult to estimate using other techniques: carrying 
capacity K and equilibrium ration between two populations Q (see online supplement 
material for further details). 

The results are shown in the Fig^and Fig. El It can be seen from the Fig^(a) that the 
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FIG. 3: (a) Weighted distribution of the inferred values of the model parameters r and rK'/K. 
(b) The same distribution top view, (c) Weighted distribution of the inferred values of the model 
parameters s and sQ/Q' . (d) Top view of the same distribution. 



model (|7J) can fit experimental data very well in a wide range of values of e.g. parameters 
r and rK'/K. This gives rise to a broad distribution of the possible dynamical trajectories 
of hidden predators shown in the FigQ] (b). However, the likelihood functions of various 
trajectories are exponentially different. This fact is taken into account by weighting the 
corresponding distributions of the model parameters with the factor exp(— Sy [x(t),c]). 
The weighted distributions of trajectories and model parameters is the main outcome of 
the statistical analysis of the ecological experimental data. 

The weighted joint distributions of the inferred pairs of parameters (r, r/K) and 
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(s, sQ/Q') are shown in the Fig. El Analysis of these distributions gives the following 
estimates of the model parameters r = 5.69 ±0.49, rK'/K = — 6. 0494 ± 1.25, K — 76 ± 17 
s = 1.08 ± 0.31, sQ/Q' = -1.17 ± 0.50, Q = 43 ± 22, g = 0.12 ± 0.3, a = 13.2 ± 2.5, 
ei = 1.4 ± .4, and eo = 1 ± .5 which are close to the values considered in the earlier 
ecolo.cal research g Q Q Q. At the sa-ne thne static, a^sis revea, that 
distributions for the parameters {H, D} (in the range of values ???) are very flat and 
further information is needed for a more accurate estimate of their values. 



IV. LEMMING OSCILLATIONS IN THE HIGH- ARCTIC TUNDRA IN 

GREENLAND 

The method can be further verified by analyzing experimental data obtained for the 
small rodents-predators community in high-arctic Greenland j^j. This data is very similar 
to the data collected in Fennoscandia with a important exception, namely, dynamics of 
both populations prey and predator was recorded very carefully in Greenland. Therefore, 
it has become possible to check if the predator dynamics reconstructed from the prey 
population alone coincides with the actual observations of the predator time series. The 
experimental data (see Fig. 3) for the population oscillation in the lemming-stoat predator 
prey community were collected in the high- Arctic tundra in Greenland in 1988-2002 2^|. 
The time-variation of the predator and prey oscillations are also influenced by a number 
of generalists predators such as arctic fox, snowy owl, and long-tailed skua. We note 
that a very detailed model with experimentally measured numerical responses of various 
predators is available [26( to simulate this data. However, to analyze hidden predator 
population along the lines outlined in the previous section we notice that the organization 
of the lemming-stoat community in the high-arctic Greenland is very similar to the vole- 
weasel community in Finnish Lapland. So we attempt to fit the lemmin g-st oat population 
oscillations using the model (JZJ) developed for the latter community 

In this model the populations are scaled log(N/K') and y = log(^f) with some 

assumed values of the carrying capacity K' and proportionality constant Q' (K' and Q' 
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FIG. 4: Lemming (a) and (b) stoat population (individuals/ha) observed in the high-arctic 
tundra in 1988-2002 26] are shown by yellow circles. The thing solid line is shown to guide the 
eye. Dashed black lines show population dynamics inferred using model (jSJ) under assumption 
that the dynamics of both populations was measured in the experiment with measurement error 
0.2. Dashdot lines show population dynamics inferred using model (JHJ) under assumption that 
only the dynamics of prey populations was measured in the experiment with measurement error 
0.2 and the predator dynamics is hidden. 

are known, while actual values K and Q are not known and have to be inferred) . The 
time- variations of x(t) and y(t) are described by the following set of equations (l3j 

K' e x e x 

x = r (1 - ei sin(27rf)) - r—e x - g ^ - ^ - a^^, +D n Ut) 

y = s(l-e 2 sin(27rf)) - s^~ x + D p £ p (t). (8) 

The parameters of these equations have the following [13^] meaning. The vole population 
is characterized by: (i) the intrinsic rate of the vole population growth r with possible 
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values in the range: 4-7 yr" 1 ; (ii) the dimensionless amplitude of seasonal forcing e 
with range: 0.5 - 1; (iii) prey carrying capacity K with range: 100 - 300 voles ha -1 . 
The specialist predator population is described by: (i) intrinsic rate of weasel population 
growth s with range: 1- 1.5 yr" 1 ; (ii) minimum consumption per predator C with range: 
500-700 voles yr" 1 weasel -1 ; (iii) half saturation constant D with range: 5-6 voles ha" 1 ; 
(iv) predator-prey constant ratio Q with range: 20-40 voles weasel" 1 . The generalist 
predation is characterized by: (i) the maximum rate of mortality G with range: 70 - 125 
voles ha" 1 yr" 1 and (ii) half-saturation prey density H with range: 11-16 voles ha -1 . 

First, we try to fit this model to the experimental data taking into account measure- 
ments of both populations. To avoid the problem related to the fact that continuous 
model is being fitted to the experimental points measured only once a year we interpolate 
experimental points for predator and prey using a piecewise cubic Hermite interpolation 
with time step h = 0.001 year. The corresponding results of the fit are shown by the 
dashed black line in the Fig. Ufa) and (b). We note that the model (JB1) can fit very well 
experimental data. However, this agreement has a limited statistical significance since we 
are fitting 30 experimental points by a nonlinear model with 18 parameters. It turns out 
that the same experimental data can be well fit in a broad range of the values of the model 
parameters. A detailed study of the landscape of the log-likelihood function is needed 
to choose the most probable model. We defer this study to a future publication. In the 
present study our main goal is to verify that even in the absence of the measurements of 
the predator population we can still recover both hidden dynamics of the predator and 
the model parameters although with degraded accuracy. 

To this end we now infer both hidden dynamics of the stoat population and the model 
parameters assuming that only population of lemming was measured. The corresponding 
inference results are shown in the Fig. Ufa) and (b) by the red dash-dot lines. The values 
of the parameters inferred in both cases are summarized in the Table [I] 

We conclude that even in the case of incomplete corrupted by noise measurements 
of the population dynamics our method allows one to recover both hidden dynamics of 
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TABLE I: Inference results for the parameters of the model (jHJ) obtained by two methods: 
Value I were obtained assuming that both populations (lemming and stoat) were measured; 
Value II were obtained assuming that only lemming population were measured. Experimental 
points in both cases were interpolated using piecewise cubic Hermite interpolation with time 
step h = 0.001 year. 



Parameter 


Value I 


Value II 


r 


2.24 


4.53 


s 


0.76 


0.99 


rK'/K 


-4.07 


-7.26 


a 


-7.96 


-14.18 


9 


0.41 


0.39 


sQ/Q' 


-0.82 


-0.95 


en 


0.63 


0.35 


e 2 i 


0.32 


0.09 


ei2 


0.28 


0.40 


e 2 2 


-0.21 


-0.29 



invisible predator and the model parameters. 

A. 3D distribution of the predator trajectories 

Finally we analyze a distribution of the most probable predator trajectories for dif- 
ferent model parameter values taken at multiple minima of LPDF discussed above (see 
Fig. IHJ). We search for local minima of LPDF with respect to the set of coefficients A4 = 
{r, rK'/K, s, sQ/Q'} in the region {5.2000 ± 4, -5.2000 ± 4, 1.2000 ± 1.1, -1.2000 ±1.1}. 
At each local minima M! we find the most probable predator trajectory x opt (£) by solving 
a boundary value problem described above and attach the statistical weight to this tra- 
jectory oc exp(— S , (x opt (t), M.'). The resulting 3D distribution of the weighted predator 
trajectories is shown in the FigEl 
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FIG. 5: (a) probaility distribution of the predator trajectories at local minima of LPDF; (b) 
the corresponding contour plot. Local minima of LPDF with respect to four model parameters 
were sampled in the region: r = {5.2000 ± 4, rK'/K = -5.2000 db 4, s = 1.2000 ± 1.1, sQ/Q' = 
-1.2000 ± 1.1}. Other parameters in this test were: C/Q' = 20.9223, GfK' = 0.2401, re x = 
2.2493, se 2 = 0.5027, (H/K') 2 = 0.04, D/K' = 0.04. 
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V. COMPARISON OF PATH-INTEGRAL BASED INFERENCE WITH 
MARKOV CHAIN MONTE CARLO METHOD 

To compare directly the results of the reconstruction obtained by the path-integral 
method and by MC algorithm we simplify problem and consider oscillations in a two- 
dimensional system of the form 

ii = 1.5x 2 + x\x 2 - 0.2x? + V^h£i(0> 

(9) 

x 2 = ~x 1 + e(l - x\)x 2 + y/D 2 2^2(t), 

where e = 0.1 and D\\ = D22 = 0.04. We assume that only x\(t) is measured with 
measurement noise rj (t) of intensity iV = 0.2 to produce an observed time series 



y 1 (t)=x 1 (t) + VNr l (t), 

while the second variable is missing. We find maximum of the posterior PDF p ps (x, -M|y) 
in the space of dynamical trajectories {x(i)} by applying two methods: path-integral 
approach as described above and Markov Chain Mote Carlo (MCMC) using Metropolis- 
Hastings algorithm within Gibbs sampling scheme (see e.g. |38|). The results are shown 
in the Fig. 4. It can be seen from the figure that the MCMC algorithm can indeed be 
used to reconstruct dynamical trajectory from the noisy measurements. However, in the 
case of missing variable the MCMC fails to recover correct solution. The reason is that 
the later requires large smooth variations of the trajectory, while the MCMC algorithm 
is searching in the space of discontinuous nondifferentiable trajectories and as a result 
converges to multiple deep spurious minima produced by the terms of the order ^ Xk +^ Xk ^ 
in the cost function (JHJ). Similar problem appears already in deterministic case [2J, where 
the multiple shooting technique is applied to solve the problem. We note that our approach 
is more general. It is valid both in stochastic and deterministic case and avoids logistic 
and technical problems related to dividing trajectory on arbitrary number of piece- wise 
continuous solutions and on gluing these solution together. 
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FIG. 6: Result of the direct comparison of the path-integral and MCMC techniques for (a) 
observed variable x\(t) and (b) hidden variable Xzif). The actual dynamical trajectories x\(t) 
and xz(t) are shown by solid lines. The measured trajectory yx(t) is shown by dashed black line 
in figure (a) and is taken as initial guess for the solution x±(t). For an unobservable trajectory 
X2(t) initial guess is taken to be (t) = 0. The solution of the boundary value problem is shown 
by yellow circles. The MCMC solution is shown by red squares. The inset in the figure (a) 
shows the variation of the cost function as a function of time for MCMC algorithm (black dots) 
and for boundary value method (yellow circle). 



A. Lorenz attarctor 



WE found our method to be sufficiently robust to work in the case of more then 
one hidden variable. To demonstrate this we consider the archetypical chaotic nonlinear 
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system of Lorenz, 

x\ = cr(x 2 -Xx) + &(*), 

x 2 = rxt - x 2 - xix 3 + £ 2 (£), ( 10 ) 

x 3 = x x x 2 - bx 3 + £ 3 (t), 

driven by zero-mean white Gaussian noise processes with covariance (£i(t) &(t')) = 
Dw S(t — t'). Synthetic data (with no measurement noise) were generated by simulating 
fTOj) using the standard parameter set a = 10, r = 28, b = |, and for various levels of 
dynamical noise intensities. It is assumed that the trajectory component x(t) shown in 
Fig-Ela) is measured directly (no measurement noise) while the components y(t) and z(t) 
are not observed (hidden variables). The results of the trajectory inference are shown in 

Fig.Ei;b,c,d). 



VI. CONCLUSIONS 



Is has been assumed up to now that a lack of observational data for predator popu- 
lations constituted a fundamental obstacle to the inference of ecological parameters from 



experimental data 



13, 



14. 



261 ] . A conclusion that can be drawn from the above results 



that this is not necessarily the case. Using the methods described above it is possible to 
reconstruct both invisible dynamics of predators population and the model parameters 
directly from measurements of prey populations, even those containing some measurement 
errors and uncertainties. 

Note, that much of the studies across many scientific disciplines rely on the analysis 



of the extre mal prop erties of the effective action similar to flHJ) in various function spaces 
(cf. e.g. [23! 13, Q, 31, 32]). For example, solution of the problems of large occa- 



sional deviations in noisy dynamical systems is given by the minimum of the functional 
© with no measurement term. Unlike the dynamical inference searching for the tra- 
jectory and model parameters that the system has with high probability, the theory of 
large deviations is concerned with an optimal fluctuation, or a least improbable path of 
the system to reach a remote state from the attractor during the rare event. However 
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FIG. 7: (a) Measured variable x(t) for the following system parameters: D\\ = 2.5, D22 = 3.0, 
and D33 = 3.5, r = 28, a = 10, b = 8/3. (b) and (c) Actual values of the unknown dynamical 
variables y(t) and z{t) are shown by the black line. Inferred values are shown by the red dots, (d) 
Actual trajectory of the system in 3D space of the variables x, y, z (black solid line) is compared 
with the inferred trajectory (red line). 



the solution of both problems provides a global minimum to the functional © in the 
space of dynamical trajectories and is equivalent to a certain Hamiltonian dynamics in an 
extended phase space. In the theory of large deviations the corresponding Hamiltonian 
is called Wentzel-Friedlin Hamiltonian Q|. The dynamical quantities appearing within 
this Hamiltonian theory have precise physical meaning and are accessible for direct ex- 
perimental measurements Very similar optimization problems also occur in 
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331 ] and the related 



the context of stochastic optimal control of large deviations jjy} 
Hamiltonian can sometimes be identified |32j with so-called Pontryagin Hamiltonian 
playing a key role in the theory of optimal control. We note however that the dynamical 
inference Hamiltonian Hy(x, p) (j4j) is of a qualitatively new type. It depends explicitly 
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on the time-varying measurement signal y(t) that plays a role of a 'control force' in the 
Hamiltonian dynamics. These considerations suggest that the proposed path-integral ap- 
proach to the problem of dynamical inference with hidden variables is a general one and 
sets the solution of this problem into the standard mathematical context. It is valid both 
in deterministic and stochastic case and is a natural generalization of the earlier ad hoc 
approach to the dynanhca, inference of de—tic systems Q Q. 

We believe that methods of Hamiltonian theory will provide a new topological insight 
to the solutions of complex problems of a dynamical inference with hidden variables. 
For example, in many cases the observed data are not sufficient to discriminate with 
high probability between the different values of system parameters and/or the forms of 
its hidden trajectory component. This corresponds to a certain 'degeneracy' set in the 
joint functional space (x(t), A4) where the functional 5* takes a constant maximum value. 
In general, the degeneracy set will be determined by the properties of the corresponding 
Lagrangian manifold associated with the auxiliary Hamiltonian system (@J) and conditions 
p(0) = p(T) = 0. We also note that whenever the dynamical inference converges to a right 
solution the inferred system trajectory and parameter values correspond to a sufficiently 
small momentum |p(£)| (of the order of noise intensities, D, N) and the minimum action 
S opt ~ 1. However in certain cases the global minimum of S corresponds to a much larger 
momentum |p(t)| ^> N, D and S opt ^> 1. Then the fitness to the data y is poor for 
any choice of parameters and trajectory. This implies that model assumptions (0) do not 
capture some important properties of the real- world system (a so called, "model error"). 
Overall, the locations of maxima of effective action S dominating the LPDF, their relative 
weights, as well as the topological structure in the joint functional space (x(t), M) answer 
the statistical question of what can or cannot be learned with a high likelihood about 
the system at hand given the available data and basic assumptions about the dynamical 
model. 

Our results also reveal a remarkable property of the dynamical inference with incom- 
plete measurements. In the absence of the model error the system parameters can be 
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learned with uncertainties ((5M a ) 2 ) that are not limited by the dynamical nor measure- 
ment noise intensities. In particular, ((5M a ) 2 ) < 1/T, for large T (see Appendix for the 
details of the derivation). On the other hand, the uncertainty in the inferred system tra- 
jectory ((5xj(t)) 2 ) is bounded from below by the dynamical and measurement noise. This 
effect can appear counterintuitive to a reader, because hidden variables and model param- 
eters are trading against each other in the log-likelihood that could seemingly cause 
the parameter and trajectory errors to be comparable with each other. The explanation 
for the above effect is that the trajectory points x(£ m ) at closely spaced instances of time 
t m are correlated with each other, those correlations are being extracted and accumulated 
during the dynamical inference which we presented in the paper and this leads to the 
shrinking of the parameter error with time below the noise level. 

The proposed method should be applicable to a broad range of problems in science 
and technology ranging from extracting parameters of molecular motors from the mea- 
surements of their progression along microtubules [i?! to the inference of a climate 
forcing mechanisms from reconstructed from the measurements of carbon dioxide in ocean 
seditnent Q. We also expeet this .nethod to be particular* asefa. to the eontext of physi- 
ological measurements where it is especially important to relate difficult-to-access param- 
eters to noninvasively- measured data 0, 4^|. The open question to be addressed in the 
near future is an extension of this theory to quantum and spatially extended systems. 



APPENDIX: BAYESIAN INFERENCE OF CONTINUOUS NOISE-DRIVEN 
DYNAMICAL SYSTEMS FROM INCOMPLETE MEASUREMENTS 

Within the Bayesian framework the problem of dynamical inference is to determine 
the conditional probability density functional (PDF) defined over the set of the unknown 
quantities (x(t),.M), subject to observations y(t). The later, so-called posterior PDF, 
p ps [x(t); M\ y(t)] is found using Bayes' theorem 

Pps [x(t);M\y} oc p oh [y(t)\K(t),M] Ppr [^(t),M]. (11) 
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Here the missing proportionality coefficient is simply a normalization factor. p pr [x(t), AA] 
is a so-called prior PDF that provides the joint statistical information about x(t) and 
At before the measurements y(t) were made. The prior PDF can be written in the 
form: p pr [x(t), Ai] = 'P[x(t)| Ai]po(At). Here Po(Ai) is some prior distribution of model 
parameters and P[x(t) | Ai] is the PDF of finding a realization of the dynamical trajectory 
x(t) for a given set of the system parameters Ai jISQ- This functional directly depends 
on the form of the stochastic dynamical model (0) and its parametrization. For example, 
in the case of the additive white noise considered in this functional has the form 



'P[x(f)|.V(] .x ( ( ^ ] ,WD 



-/C/2 



x exp 



T 



dt V • K 



x- 



K^rrVx-ic 



mpie, 

(12) 
(13) 



where x = x(t), K = K(x(t),C) and a coefficient of proportionality is a normalization 
factor. 

In Eq.fjTTj) p b[3^|x(t), M\ is a conditional PDF to observe the measurement signal y(t) 
for a specific realization of a system trajectory x(t) and model parameters Ai. For the 
continuous-time measurement model considered in (Q) this PDF takes the form 



Poh [y\x(t),M) 



detN 




-/C/2 



(14) 



x exp — 



dt (y(t) - B x(t)) N- 1 ( y(t) - B x(t) 



and describes the zero- mean Gaussian statistics of the measurement error /3(t) — y(t) — 
Bx(t). Returning back to the original discreet-time measurements 3^ = {y(tm), t m = 
mh, m = 1 : K.} one gets ((3k{t m )(3k'{t' m )) = N kk </h8 mm >. 

Prior PDF po[-M] usually represents a posterior PDF obtained as a result of the dy- 
namical inference based on the previous sets of data and on the expert knowledge about 
possible domains for the system parameters. Often the inference is entirely based on the 
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present set of data and the prior PDF is assumed to be completely uniform. In this case 
the posterior PDF p ps [x(t); JM\ y] in (|TT|) is usually referred to as a likelihood PDF. We 
denote the later as py[x(t); A4] and obtain: 

Py [x(t); M] cc Poh [y \x(t),M] V[x(t)\M\. (15) 

Using Eqs. (J13|) and (|14j) in (|15|) one can rewrite the likelihood PDF in the form 

p y [x(*);M] =A y exp(-S y [x(t);M\). (16) 

where the negative log-likelihood function Sy is given in (jHJ) and Ay is a normalization 
factor that does not depend on x(t) nor M.. 

a. Calculation of the expectation values using the maximum-likelihood estimation. In 
the asymptotic limit of a sufficiently long and dense data record y and low noise inten- 
sities the PDF py[x(t);J\A] is, generally, a very steep function of its arguments and the 
derivatives of Sy with respect to x(t) and M. are much greater then 1 (assuming all 
quantities are dimensionless) . In this case the expectation values of the system trajec- 
tory (x(£)) and the model parameters {(Ai a )} for a given measurement record y can be 
obtained by computing a maximum of the likelihood PDF in the joint space (x(t); A4). 
The conditions for the maximum have the form of the variational equations 

5Sy -0, (17) 



<Jx(t) 

that have to be solved simultaneously with the system of the algebraic equations 

dS >- , 



dc 

0, D = D J , (19) 



dS y _ n _ TV/' 

<9D 
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= 0, N = N T , (20) 
^ = 0. (21) 

dB 

Eqs. ()17p for the minimum of the action with respect to the trajectory components 
Xi(t) correspond to the Hamiltonian equations (jlj),© with the appropriate boundary 
conditions described in the main text. 

Inference of the model parameters was considered in j3] under the simplifying assump- 
tions that the measurement noise is zero, there are no hidden variables and the force field 
is linear in the parameters {c a } (but gen erally, nonlinear in x). Below we provide the gen- 
eralization of the results of the Ref. |23J] that allows us to infer the unknown parameters 
of the measurement model and does not relay on the linearity of K in c. 

The Eq. ()18|) gives the conditions of the minimum of Sy with respect to the parameters 
{c a } of the force field K(x(t),c). Using the Eq. (JHJ) we obtain these conditions in the 
following form: 

dt °— D 1 [x(f) - K(x(f), c)] = -J dtjj^V- K(x(t), c). (22) 

Solving the Eqs. (fTT?|) and (j2Uj) with respect to D and N, respectively, we obtain 

1 f T 

Dij = - dt [±i - Ki(x(t), c)] [ Xj - ^•(x(t), c)], (23) 

I fT L L 

N k i = y dt [y k (t) - V B kiXl (t)} [ yi (t) - V B ljXj {t)). 

J ° i=l j=l 



(24) 



Finally, the Eq. (jzl)) can be re- written in the explicit form of the system of linear equations 
for the matrix elements of B 

^2AS'B w = W Ul (25) 
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where 



A^'HN^W / dt Xi ,{t)xi{t), 



(26) 



W ki = J2(^~ 1 )ki [ dt yi (t)xi{t) 



(27) 



One solves simultaneously Eqs. (|22 p -([27| ) and the Hamiltonian equations (JU),® and se- 
lects the solution with the minimum value of Sy. 

b. Calculation of the variances. We now consider how the variances of the model 
parameters around the maximum of the LPDF depend on the noise intensity and length 
of the observation record. We focus on demonstrating of the main effect mentioned in 
Conclusion and for brevity we assume that there are two dynamical variables, one of them, 
x\, is hidden, and the other, X2, is observed with zero measurement error, X2{t) = 7/2 (^)- 
We assume that the correlation matrix D of the dynamical noise is diagonal with the small 
nonzero matrix elements Dj = Djj <C 1. We also assume that there is only one unknown 
model parameter c and it enters the expression for the vectorial force field K = K(x, c). 

The action functional in the reduced space Sy[xi(t),y2{t); c] = s[xi(t); c] has the form 



V(xut,c) = - JBj-fe («) - „(«); c)) - f ± >i °> . (29) 



At certain point (x° p (t), c opt ) where the action s[xi(t), c] reaches its minimum the condi- 
tions 5s/5x\{t) = and ds/dc = are satisfied. Consider now the trajectory x\ = x(t, c) 
corresponding to the partial minimum of the action with respect to x\ (t) with the value of 
the model parameter fixed: min Xl ( t ) s[xi(t),c] = s[x(t, c), c]. The Hamiltonian equations 




T 
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(jfj) for x(t, c) have the following form: 



±i = pi + K 1 (x 1 (t),y 2 (t); c), (30) 
Pl = _ pi ^i(^ft).y2(t);c) _ ^(xi(t) > t; C ) > (0)= (r)=0< (31) 

Of central interest for us here is the coefficient of expansion of the action s[x(t,c);c] in 
c - c opt 

S [x(t, C ), C ]«|(c-0 2 , (5c 2 ) = a~\ (32) 

that equals to the inverse variance of the model parameter c. To calculate this coefficient 
we expend the trajectory x(t, c) « (c - c opt ) f (i) + C ((c - c opt ) 2 ). Then, using (jHD , ({3Tj) 
we obtain in the leading order in _D n , .D 22 1 



a = a(T) 



2Di V 9c ctei 7 2_D 2 V 9c ctei 



(33) 



The function £(£) can be obtained from solution of the following system of equa- 
tions obtained by linearization of equations (J3~T|) around the Hamiltonian trajectory 
(x opt (t) , p opt (t) ) in the extended space (x,p) corresponding to the full minimum of the 
action s[x(t] c), c]: 

U J P 2 V ^1 / ^2 d Xl OC ' 1 j 

<W pt f)K opt 

m = V (t) +m ^±- + ^ (35) 
tj(0) = r,{T) = 0, (36) 

(all the partial derivatives of K\ above are evaluated at the arguments x\(t) = x opt (t) and 
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We note that the integrand in the expression for a(T) represents a sum of squares 
and therefore a(T) is a growing function of T, implying that the variance (5c 2 ) shrinks 
down with T. Assume now that the measurement of the trajectory component 1/2 (£) varies 
periodically for large t (system approaches a periodic attractor). This variation will play a 
role of a periodic forcing in Eqs. (}34j) - (j36|) and the long-time solutions of those equations, 
£(t),i](t) will also have a periodic component. That means that a(T) in (J33|) is growing 
at least linearly with T, an assertion made in the Conclusion. Dynamical inference with 
hidden variables in systems with chaotic attractors will be considered elsewhere. 
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